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(54) PRONUNCIATION EVALUATION SYSTEM 

(57) A database (1 6) stores reference voice data for 
beginner's, intermediate and advance levels for each 
text of a foreign language textbook, and when any one 
of texts in the lesson screen displayed on a CRT (26) is 
selected, the reference voice data corresponding to this 
text is read out and the model pronunciation is output 
from a voice synthesis unit (34). The user listens to this, 
and imitates the pronunciation facing a microphone 
(20). The computer obtains the voice data through the 
spectrum analysis of the user voice input from the mi- 
crophone (20) by the voice recognition unit (22) and de- 



termines the user pronunciation level by comparing the 
same with the reference voice data of the database (16). 
A predetermined success mark is displayed on the 
screen , if the user pronunciation is so good that it is com- 
municated exactly to the collocutor, and recognized by 
the collocutor, and it proceeds to the practice of the fol- 
lowing text. If the determination result is bad, the prac- 
tice is repeated for the same text many times. This al- 
lows the user to judge if his/her pronunciation is recog- 
nized by a foreigner and improve the foreign language 
pronunciation learning effect, by repeating this practice. 
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RECOGNITION BEGINNER 



5 POINTS 



RECOGNITION INTERMEDIATE 7 POINTS 

Q RECOGNITION ADVANCED 8 POINTS 

NUMBER OF RECOGNITION TIMES BY TEXT 10TH 



GOOD MORNING I JOHN. 



HOW ARE YOU ? 



]© 



I AM FINE. AND YOU ? 



1 



NOT SO GOOD. I HAVE A CHILL. 



1 



BE CAREFUL NOT TO CATCH A COLD 



-HAVE A NICE DAY. 
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Description 

Technical Field 

[0001] The present invention relates to a pronuncia- 
tion judgment system using a voice recognition function 
for language pronunciation practice of foreign language 
or the like including especially English conversation, 
and a recording medium for storing a computer program 
thereof. 

Background Art 

[0002] Conventionally, a number of language learning 
systems for practicing English conversation or the like 
have been developed. A typical system is an interaction 
with a ..computer. Here, the computer becomes one 
speaker, displays the face of a collocutor on the screen, 
and asks questions to which a user responds. This user 
response voice is input to the computer and recognized. 
Then, when it agrees with the correct answer contents, 
a person representing the collocutor on the screen nods, 
or other predetermined display is executed, it proceeds 
to the next question in a way to continue the conversa- 
tion. 

[0003] However, this system requires to examine also 
the content of the response; hence the system is not 
appropriate for a simple pronunciation repeat practice. 
In short, when the response content is not correct, the 
conversation does not continue, in this case, the user 
can not determine whether the content itself was wrong 
or his/her pronunciation was wrong. In addition, the user 
can not concentrate his/her attention to the pronuncia- 
tion practice, worrying about giving a correct answer. 
Further, the agreement with the correct answer content 
is determined by the comparison with a single kind of 
reference voice data representing the answer content 
and the determination is fixed; therefore, when the con- 
tent agrees and only the pronunciation disagrees, the 
user can not know how wrong was his/her pronunciation 
and, hence, can not realize to which extent his/her pro- 
nunciation is understood by a foreigner. In addition, if 
the reference voice data level is too high, the user can 
not pass although he/she tries many times, loosing pos- 
sibly his/her motivation. 

[0004] It is an object of the present invention is to pro- 
vide a pronunciation judgment system allowing to know 
objectively to what extent one's pronunciation is recog- 
nized by the collocutor, and a recording medium for stor- 
ing a computer program thereof. 
[0005] Another object of the present invention to pro- 
vide a pronunciation judgment system allowing to prac- 
tice the pronunciation effectively through a repeated 
pronunciation practice of the same text, and display of 
the degree of similarity to the reference pronunciation, 
each time, and a recording medium for storing a com- 
puter program thereof. 
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Disclosure of Invention 

[0006] The pronunciation judgment system of the 
present invention comprises a database for storing ref- 

5 erence pronunciation data, reference voice playback 
means for outputting the reference voice based on the 
reference pronunciation data, similarity determination 
means for comparing a user pronunciation data input in 
correspondence to the reference voice and the refer- 

10 ence pronunciation data, and means for informing the 
user of the agreement, if the similarity determination 
means judges the agreement of both data. 
[0007] In a preferred embodiment, the database may 
store a plurality of reference pronunciation data corre- 

15 spondingtothe pronunciation fluency level, forthesame 
language. The reference voice playback means may in- 
clude a user operation member for selecting the level 
and output the selected level reference voice, until the 
informing means informs the userthe agreement of both 

20 data. The database may store reference pronunciation 
data of a plurality of level for each of a number of sen- 
tences, while the reference voice playback means may 
include a user operation member for selecting sentenc- 
es and the level and output the selected level reference 

25 voice of the selected sentence, until the informing 
means informs the user the agreement of both data. It 
may further include means for displaying a sentence 
corresponding to the reference pronunciation data. 
[0008] The computer readable recording medium for 

30 recording a program to be executed by a computer of 
the present invention records a computer program for 
executing by a computer steps of reading out the refer- 
ence voice data from the database, playing back refer- 
ence voice based on the read out reference voice data, 

35 judging the similarity by comparing the user pronuncia- 
tion data input in correspondence to the reference voice 
data and the reference voice data, and informing the us- 
er of the agreement of both data if such agreement is 
determined by the similarity determination step. 

40 [0009] In a preferred embodiment, the database may 
store a plurality of reference pronunciation data corre- 
spondingtothe pronunciation fluency level, forthe same 
language. The reference voice playback step may out- 
put the user selected level reference voice, until the in- 

45 forming step informs the user of the agreement of both 
data. The database may store reference pronunciation 
data of a plurality of level for each of a number of sen- 
tences, while the reference voice playback step may 
output the user selected level reference voice of the user 

50 selected sentence, until the informing step informs the 
user of the agreement of both data. The program may 
execute a step of displaying a sentence corresponding 
to the reference pronunciation data by the computer. 
[0010] The present invention allows to judge if one's 

55 pronunciation attains the level to be recognized by the 
collocutor, and improve the language learning (pronun- 
ciation learning) efficiency, by repeating this practice. 
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Brief Description of Drawings 
[0011] 

FIG. 1 is a block diagram showing a configuration 
of the pronunciation judgment system according to 
the present invention; 

FIG. 2 is a flow chart showing the flow during the 
pronunciation practice according to the present in- 
vention; and 

FIG. 3 shows an example of lesson screen. 

Best Mode of Carrying Out the Invention 

[0012] Now, the embodiment of pronunciation judg- 
ment system of the present invention will de described. 
[0013] FIG. 1 is a block diagram showing a configu- 
ration of the whole system. A CPU 1 0, a CD-ROM drive 
12 are connected to a system bus 14. This system is 
realized by executing a computer program stored in the 
CD-ROM drive 12 by the CPU 10. A database 16 for 
storing reference pronunciation data serving as model 
of pronunciation practice, for the respective beginner's, 
intermediate and advanced levels and a level selection 
unit 1 8 for selecting the level of the database 1 6 are also 
connected to the system bus 14. The database 16 is 
constructed by collecting pronunciation signal (wave- 
form signal) of a great number of individuals (several 
hundreds of thousand) and averaging pronunciation da- 
ta of spectrum analysis thereof. Here, the database 16 
is included in the pronunciation practice program, and it 
may be contained in a CD-ROM and taken in the sys- 
tem, each time. The beginner's level corresponds to the 
pronunciation of a Japanese teacher of English, the ad- 
vanced level to the pronunciation of a fluent European 
and American speaker, and the intermediate level to the 
pronunciation of a European and American speaker 
who does not speak so fluently. The database is not nec- 
essarily divided into three physical units, but it may only 
be divided functionally. 

[0014] A microphone 20 for inputting the voice wave- 
form pronounced by a user is connected to the system 
bus 14 through a voice recognition unit 22. The voice 
recognition unit 22 obtains the pronunciation data 
through spectrum analysis of input voice waveform. This 
voice recognition unit 22 should perform the same spec- 
trum analysis as used for obtaining the pronunciation 
data of the database. A CRT 26 is connected to the sys- 
tem bus 14through a display controller24, and a mouse 
28 and a keyboard 30 are connected through an I/O 32 
and, also, a speaker 36 is connected through a voice 
synthesis unit 34. 

[001 5] Now, the operation of the present embodiment 
will be described referring to the flow chart shown in FIG. 
2. This flow chart shows the processing flow of computer 
program performed by the CPU 1 0 and stored in the CD- 
ROM 12. Upon starting the operation, a lesson screen 
shown in FIG. 3 is displayed. This embodiment is sup- 
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posed to be based on, for example, English textbook for 
junior high school, and be a pronunciation practice sys- 
tem of texts included in the textbook. The lesson screen 
comprises a lesson chapter display section 50, an im- 

5 age display section 52 related to the lesson chapter 52, 
a text display section 54, a pronunciation level display 
section 56, and a display section 58 showing the number 
of times of practice per text. The lesson chapter display 
section 50 displays right and left triangular icons, allow- 

10 ing to select a lesson chapter by operating them with the 
mouse 28. The text display section 54 shows a plurality 
of texts, and a square icon showing the text selection 
state at the left of each text, and a heart mark icon show- 
ing a good pronunciation level determination result as 

15 the right are displayed. The heart mark icon is a success 
mark to be displayed a student can pronounce similarly 
to the model pronunciation (divided into three levels). 
The level display section 56 displays also the note (out 
of 1 0) for the respective level; however, this note is noth- 

20 jng but a standard for indicating the difficulty of respec- 
tive levels. In the example of FIG. 3, the beginner's level 
is selected. 

[0016] In step S10, the lesson chapter is selected. In 
step S12, the level is selected. The level is selected by 
25 selecting any level line with mouse. Here, the beginner's 
level is selected. In step S14, the text is selected. In the 
example of FIG. 3, the third "I am fine. And you?" is se- 
lected. 

[001 7] In step S1 6, the beginner's level reference pro- 

30 nunciation data of this selected text is read out from the 
database 16, the voice is synthesized at the voice syn- 
thesis unit 34 and output from the speaker 36 as model 
pronunciation. The model pronunciation may be output 
not only once but several times, and the output speed 

35 may be varied for several output. 

[0018] In step S18, the user pronounces imitating this 
model voice. The user voice waveform is input into the 
voice recognition unit 22 through the microphone 20. 
The voice recognition unit 22 obtains the pronunciation 

40 data through the spectrum analysis of this voice signal. 
[0019] In step S20, the user pronunciation data and 
the reference voice data stored in the database 16 are 
compared to obtain the similarity degree. The higherthis 
similarity is, the closer the user pronunciation is to the 

45 reference voice, showing that the user speaks well, and 
one's pronunciation has a higher possibility to be com- 
municated exactly to the collocutor and recognized cor- 
rectly. 

[0020] In step S22, it is determined whether this sim- 
50 jlarity is higherthan a predetermined similarity, or wheth- 
erthis text pronunciation has obtained the passing mark 
and succeeded. If the passing mark is not obtained, it 
goes back to step S16, again, the same text reference 
voice is output from the speaker 36, and the user re- 
55 peats the pronunciation practice. 

[0021] If one text is passed, in step S24, it is deter- 
mined whether all texts of a chapter are passed or not. 
If there is any text that is not passed, it goes back to step 
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S14, another text is selected, and the user repeats the 
pronunciation practice. 

[0022] If all texts are passed, in step S26, it is deter- 
mined whether all levels are passed. If there is any level 
that has not been passed, it goes back to step S12, an- 
other level is selected, and the user repeats the pronun- 
ciation practice for all texts of the concerned level. 
[0023] If all levels are passed, in step S28, it is deter- 
mined whether the other chapters are also passed. If 
there is any chapter that has not been passed, it goes 
back to step S10, another chapter is selected, and the 
user repeats the pronunciation practice for all texts, all 
levels of the concerned chapter. 

[0024] As described above, in the present embodi- 
ment, the text is displayed and the reference voice is 
pronounced using a computer, while the student imi- 
tates this pronunciation and input from the microphone 
20. Then, in the computer, the similarity between the ref- 
erence voice data and the student input voice data is 
determined, and if the similarity is lower than a prede- 
termined value, it makes the student repeat the pronun- 
ciation practice, and when it is becomes higher than the 
predetermined value, a success mark is displayed. 
Thus, the pronunciation practice can be repeated as de- 
sired effectively, because the pronunciation practice can 
be repeated as desired for the same text, and pronun- 
ciation level determination result is displayed each time. 
In addition, the reference voice data is not limited to one 
kind, but three kinds including the beginner's level pro- 
nunciation data which is the pronunciation of a Japa- 
nese teacher, the advanced level pronunciation data 
which is the pronunciation of a particularly fluent native 
speaker, and the intermediate level pronunciation data 
which is the pronunciation of a foreign speaker who 
does not speak so fluently, thereby allowing to improve 
the pronunciation gradually from the beginner's level to 
the advanced level through the intermediate level, 
avoiding a case where the user can not succeed al- 
though he/she tries many times because the level is too 
high, and preventing him/her from losing the motivation. 
[0025] The present invention in not limited to the em- 
bodiment mentioned above, but various modifications 
can be executed. For example, the essential configura- 
tion of the lesson screen has only to have the success 
mark and the other displays are arbitrary at all. Further, 
in addition to displaying only the success mark, the sim- 
ilarity to the reference voice may be scored, even in case 
of failure. Here, the reference pronunciation and the us- 
er pronunciation are conducted alternately; however, it 
is preferable to make the user pronounce at the same 
time as hearing the reference pronunciation. In the ref- 
erence voice database, not average data of voice data 
of number of persons (data after spectrum analysis), but 
the voice wave form of a particular speaker can be 
stored as it is. In this case, the voice synthesis unit 34 
at the front stage of the speaker 36 is not necessary. In 
place, it is necessary to submit the voice waveform sig- 
nal read out from the database to the spectrum analysis 



by the voice recognition unit 22 as the user input voice 
signal from the microphone, and to compare with the 
user input voice data. The object of practice is not limited 
to English and may include Chinese or the like, and it is 

5 not limited to foreign languages, but may include Japa- 
nese (National language) orthe like. In addition, the cor- 
responding Japanese may be displayed at the same 
time under the English text display. Further, in place of 
providing database for respective three levels, but it may 

10 be so constructed to use a single database, allowing to 
change only the level. It will be enough to have the re- 
peated practice effects for the present invention, and it 
is not always necessary to divide the reference pronun- 
ciation into a plurality of levels. 

15 

Industrial Applicability 

[0026] As mentioned above, the present invention al- 
lows to provide a pronunciation judgment system capa- 

20 ble of determining whether one's pronunciation is rec- 
ognized by the collocutor, and a recording medium for 
storing a computer program thereof. In addition, the 
present invention can provide a pronunciation judgment 
system allowing to practice the pronunciation effectively 

25 through a repeated pronunciation practice of the same 
text, and to practice the pronunciation effectively alone 
until the a predetermined similarity level is obtained by 
comparing, each time, with the reference voice, deter- 
mining whether it agrees with the reference and display- 

30 |ng how it resembles to the reference pronunciation, and 
a recording medium storing the a computer program 
thereof. 



35 Claims 

1 . A pronunciation judgment system comprising: 

a database for storing reference pronunciation 
40 data; 

reference voice playback means for outputting 
a reference voice based on said reference pro- 
nunciation data; 

similarity determination means for comparing a 
45 user pronunciation data input in correspond- 

ence to said reference voice and said reference 
pronunciation data; and 

means for informing a user of a result of deter- 
mination made by said similarity determination 
50 means. 

2. The pronunciation judgment system according to 
claim 1 , characterized in that said database stores 
a plurality of reference pronunciation data corre- 

55 sponding to a pronunciation fluency level, for the 
same language. 

3. The pronunciation judgment system according to 
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claim 2, charact rized in that said reference voice 
playback means includes a user operative member 
for selecting a level, and outputs a selected level 
reference voice, until said similarity determination 
means detects agreement of both data. 

4. The pronunciation judgment system according to 
claim 1 , characterized in that said database stores 
reference pronunciation data of a plurality of level 
for each of a number of sentences, and said refer- 
ence voice playback means includes a user opera- 
tive member for selecting a sentence and a level 
and outputs a selected level reference voice of a 
selected sentence, until said similarity determina- 
tion means detects agreement of both data. 

5. The pronunciation judgment system according to 
claim 1, characterized by further comprising 
means for displaying a sentence corresponding to 
the reference pronunciation data. 

6. The pronunciation judgment system according to 
claim 1, characterized in that said informing 
means informs of the agreement of both data. 

7. A computer readable recording medium for storing 
a program for causing a computer to execute the 
steps of: 

reading out reference voice data from a data- 
base; 

playing back a reference voice based on the 
read out reference voice data; 
determining a similarity by comparing user pro- 
nunciation data input in correspondence to said 
reference voice data and said reference voice 
data; and 

informing a user of a result of determination 
made by said similarity determination step. 

8. The recording medium according to claim 7, char- 
acterized in that said database stores a plurality of 
reference pronunciation data corresponding to a 
pronunciation fluency level, for the same language. 

9. The recording medium according to claim 7, char- 
acterized in that said reference voice playback 
step outputs a user selected level reference voice, 
until said similarity determination step detects 
agreement of both data. 

10. The recording medium according to claim 7, char- 
acterized in that said database stores reference 
pronunciation data of a plurality of levels for each 
of a number of sentences, and said reference voice 
playback step outputs a user selected level refer- 
ence voice of a user selected sentence, until said 
similarity determination step detects agreement of 



both data. 

11. The recording medium according to claim 7, char- 
act rized in that said program causes a computer 

5 to execute also a step for displaying a sentence cor- 

responding to the reference pronunciation data. 

12. The recording medium according to claim 7, char- 
acterized in that said informing step informs of 

10 agreement of both data. 
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